Paraphrase Lattice for Statistical Machine Translation
نویسندگان
چکیده
Lattice decoding in statistical machine translation (SMT) is useful in speech translation and in the translation of German because it can handle input ambiguities such as speech recognition ambiguities and German word segmentation ambiguities. We show that lattice decoding is also useful for handling input variations. Given an input sentence, we build a lattice which represents paraphrases of the input sentence. We call this a paraphrase lattice. Then, we give the paraphrase lattice as an input to the lattice decoder. The decoder selects the best path for decoding. Using these paraphrase lattices as inputs, we obtained significant gains in BLEU scores for IWSLT and Europarl datasets.
منابع مشابه
Comparing Phrase-based and Syntax-based Paraphrase Generation
Paraphrase generation can be regarded as machine translation where source and target language are the same. We use the Moses statistical machine translation toolkit for paraphrasing, comparing phrase-based to syntax-based approaches. Data is derived from a recently released, large scale (2.1M tokens) paraphrase corpus for Dutch. Preliminary results indicate that the phrase-based approach perfor...
متن کاملIntroduction of a new paraphrase generation tool based on Monte-Carlo sampling
We propose a new specifically designed method for paraphrase generation based on Monte-Carlo sampling and show how this algorithm is suitable for its task. Moreover, the basic algorithm presented here leaves a lot of opportunities for future improvement. In particular, our algorithm does not constraint the scoring function in opposite to Viterbi based decoders. It is now possible to use some gl...
متن کاملDiverse Words, Shared Meanings: Statistical Machine Translation for Paraphrase, Grounding, and Intent
Can two different descriptions refer to the same event or action? Recognising that dissimilar strings are equivalent in meaning for some purpose is something that humans do rather well, but it is a task at which machines often fail. In the Natural Language Processing Group at Microsoft Research, we are attempting to address this challenge at sentence scale by generating semantically equivalent ...
متن کاملSupport Vector Machines for Paraphrase Identification and Corpus Construction
The lack of readily-available large corpora of aligned monolingual sentence pairs is a major obstacle to the development of Statistical Machine Translation-based paraphrase models. In this paper, we describe the use of annotated datasets and Support Vector Machines to induce larger monolingual paraphrase corpora from a comparable corpus of news clusters found on the World Wide Web. Features inc...
متن کاملNeural Paraphrase Generation using Transfer Learning
Progress in statistical paraphrase generation has been hindered for a long time by the lack of large monolingual parallel corpora. In this paper, we adapt the neural machine translation approach to paraphrase generation and perform transfer learning from the closely related task of entailment generation. We evaluate the model on the Microsoft Research Paraphrase (MSRP) corpus and show that the ...
متن کامل